1,273 research outputs found

    An Improvement Study of the Decomposition-based Algorithm Global WASF-GA for Evolutionary Multiobjective Optimization

    Get PDF
    The convergence and the diversity of the decompositionbased evolutionary algorithm Global WASF-GA (GWASF-GA) relies on a set of weight vectors that determine the search directions for new non-dominated solutions in the objective space. Although using weight vectors whose search directions are widely distributed may lead to a well-diversified approximation of the Pareto front (PF), this may not be enough to obtain a good approximation for complicated PFs (discontinuous, non-convex, etc.). Thus, we propose to dynamically adjust the weight vectors once GWASF-GA has been run for a certain number of generations. This adjustment is aimed at re-calculating some of the weight vectors, so that search directions pointing to overcrowded regions of the PF are redirected toward parts with a lack of solutions that may be hard to be approximated. We test different parameters settings of the dynamic adjustment in optimization problems with three, five, and six objectives, concluding that GWASF-GA performs better when adjusting the weight vectors dynamically than without applying the adjustment.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    MACOC: a medoid-based ACO clustering algorithm

    Get PDF
    The application of ACO-based algorithms in data mining is growing over the last few years and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works concerning unsupervised learning have been focused on clustering, showing great potential of ACO-based techniques. This work presents an ACO-based clustering algorithm inspired by the ACO Clustering (ACOC) algorithm. The proposed approach restructures ACOC from a centroid-based technique to a medoid-based technique, where the properties of the search space are not necessarily known. Instead, it only relies on the information about the distances amongst data. The new algorithm, called MACOC, has been compared against well-known algorithms (K-means and Partition Around Medoids) and with ACOC. The experiments measure the accuracy of the algorithm for both synthetic datasets and real-world datasets extracted from the UCI Machine Learning Repository

    A rigorous evaluation of crossover and mutation in genetic programming

    Get PDF
    The role of crossover and mutation in Genetic Programming (GP) has been the subject of much debate since the emergence of the field. In this paper, we contribute new empirical evidence to this argument using a rigorous and principled experimental method applied to six problems common in the GP literature. The approach tunes the algorithm parameters to enable a fair and objective comparison of two different GP algorithms, the first using a combination of crossover and reproduction, and secondly using a combination of mutation and reproduction. We find that crossover does not significantly outperform mutation on most of the problems examined. In addition, we demonstrate that the use of a straightforward Design of Experiments methodology is effective at tuning GP algorithm parameters

    Searching for dark clouds in the outer galactic plane I -- A statistical approach for identifying extended red(dened) regions in 2MASS

    Get PDF
    [Abridged] Though the exact role of infrared dark clouds in the formation process is still somewhat unclear, they seem to provide useful laboratories to study the very early stages of clustered star formation. Infrared dark clouds have been identified predominantly toward the bright inner parts of the galactic plane. The low background emission makes it more difficult to identify similar objects in mid-infrared absorption in the outer parts. This is unfortunate, because the outer Galaxy represents the only nearby region where we can study effects of different (external) conditions on the star formation process. The aim of this paper is to identify extended red regions in the outer galactic plane based on reddening of stars in the near-infrared. We argue that these regions appear reddened mainly due to extinction caused by molecular clouds and young stellar objects. The work presented here is used as a basis for identifying star forming regions and in particular the very early stages. We use the Mann-Whitney U-test, in combination with a friends-of-friends algorithm, to identify extended reddened regions in the 2MASS all-sky JHK survey. We process the data on a regular grid using two different resolutions, 60" and 90". The two resolutions have been chosen because the stellar surface density varies between the crowded spiral arm regions and the sparsely populated galactic anti-center region. We identify 1320 extended red regions at the higher resolution and 1589 at the lower resolution run. The majority of regions are associated with major molecular cloud complexes, supporting our hypothesis that the reddening is mostly due to foreground clouds and embedded objects.Comment: Accepted for publication in A&A -- 9 pages, 5 figures (+ on-line only tables

    Determining appropriate approaches for using data in feature selection

    Get PDF
    Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

    A Layer-Wise Information Reinforcement Approach to Improve Learning in Deep Belief Networks

    Full text link
    With the advent of deep learning, the number of works proposing new methods or improving existent ones has grown exponentially in the last years. In this scenario, "very deep" models were emerging, once they were expected to extract more intrinsic and abstract features while supporting a better performance. However, such models suffer from the gradient vanishing problem, i.e., backpropagation values become too close to zero in their shallower layers, ultimately causing learning to stagnate. Such an issue was overcome in the context of convolution neural networks by creating "shortcut connections" between layers, in a so-called deep residual learning framework. Nonetheless, a very popular deep learning technique called Deep Belief Network still suffers from gradient vanishing when dealing with discriminative tasks. Therefore, this paper proposes the Residual Deep Belief Network, which considers the information reinforcement layer-by-layer to improve the feature extraction and knowledge retaining, that support better discriminative performance. Experiments conducted over three public datasets demonstrate its robustness concerning the task of binary image classification

    Estimating the F1 score for learning from positive and unlabeled examples

    Get PDF
    Semi-supervised learning can be applied to datasets that contain both labeled and unlabeled instances and can result in more accurate predictions compared to fully supervised or unsupervised learning in case limited labeled data is available. A subclass of problems, called Positive-Unlabeled (PU) learning, focuses on cases in which the labeled insta

    Randomized Reference Classifier with Gaussian Distribution and Soft Confusion Matrix Applied to the Improving Weak Classifiers

    Full text link
    In this paper, an issue of building the RRC model using probability distributions other than beta distribution is addressed. More precisely, in this paper, we propose to build the RRR model using the truncated normal distribution. Heuristic procedures for expected value and the variance of the truncated-normal distribution are also proposed. The proposed approach is tested using SCM-based model for testing the consequences of applying the truncated normal distribution in the RRC model. The experimental evaluation is performed using four different base classifiers and seven quality measures. The results showed that the proposed approach is comparable to the RRC model built using beta distribution. What is more, for some base classifiers, the truncated-normal-based SCM algorithm turned out to be better at discovering objects coming from minority classes.Comment: arXiv admin note: text overlap with arXiv:1901.0882

    BGrowth: an efficient approach for the segmentation of vertebral compression fractures in magnetic resonance imaging

    Full text link
    Segmentation of medical images is a critical issue: several process of analysis and classification rely on this segmentation. With the growing number of people presenting back pain and problems related to it, the automatic or semi-automatic segmentation of fractured vertebral bodies became a challenging task. In general, those fractures present several regions with non-homogeneous intensities and the dark regions are quite similar to the structures nearby. Aimed at overriding this challenge, in this paper we present a semi-automatic segmentation method, called Balanced Growth (BGrowth). The experimental results on a dataset with 102 crushed and 89 normal vertebrae show that our approach significantly outperforms well-known methods from the literature. We have achieved an accuracy up to 95% while keeping acceptable processing time performance, that is equivalent to the state-of-the-artmethods. Moreover, BGrowth presents the best results even with a rough (sloppy) manual annotation (seed points).Comment: This is a pre-print of an article published in Symposium on Applied Computing. The final authenticated version is available online at https://doi.org/10.1145/3297280.329972
    • …
    corecore